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Abstract 

Boundary element methods produce dense linear systems that can be accelerated via multipole expansions. Solved 
with Krylov methods, this implies computing the matrix-vector products within each iteration with some error, at an 
accuracy controlled by the order of the expansion, p. We take advantage of a unique property of Krylov iterations 
that allow lower accuracy of the matrix-vector products as convergence proceeds, and propose a relaxation strategy 
based on progressively decreasing p. In extensive numerical tests of the relaxed Krylov iterations, we obtained speed- 
ups of between 2.lx and 3.3x for Laplace problems and between 1.7x and 4.0x for Stokes problems. We include an 
application to Stokes flow around red blood cells, computing with up to 64 cells and problem size up to 13 Ik boundary 
elements and nearly 400k unknowns. The study was done with an in-house multi-threaded C-i-H- code, on a hexa-core 
CPU. The code is available on its version-control repository, https;//github.com/barbagroup/fmm-bem-relaxed 

Keywords: Laplace equation, Stokes equation, boundary integral equation, boundary element method, fast multipole 
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1. Introduction 

The boundary element method (bem) is popular in applied mechanics for solving problems governed by the equa¬ 
tions of Laplace, Stokes, Helmholtz and Lame. Applications include electrostatics, low-Reynolds-number flow, acous¬ 
tics and linear elasticity and span all scales from biomolecules, micro-electromechanical and biomedical devices to 
wind turbines, submarines and problems in aerospace and geodynamics. The key feature of bem is formulating the 
governing partial differential equations in equivalent boundary integral equations, and discretizing over the boundaries. 
This process reduces the dimensionality of the problem (three-dimensional problems are solved on two-dimensional 
surfaces), but generates dense systems of algebraic equations. The computational complexity of dense solvers, scaling 
as 0{N^) for direct methods and 0(N^) for iterative methods, frustrated large-scale applications of bem until the late 
1990s, when bem researchers began working out how to incorporate fast algorithms ID. Treecodes and fast multipole 
methods (fmm) reduce the complexity of bem solutions to 0{N\ogN) and 0{N), respectively, although often with 
serious programming effort. Nevertheless, large-scale bem simulations are now possible, especially given the parallel 
scalability of the fmm 121 |3]- 

Accelerating bem solutions with emm hinges on seeing the dense matrix-vector products done within each iteration 
of the linear solver, as A-body problems. Gauss quadrature points on source panels and collocation points on target 
panels interact via the Green’s function of the governing equation, similar to how charges, particles or masses interact 
under electrostatic or gravitational potentials. In the same way, far-away sources can be clustered together and rep¬ 
resented by series expansions to calculate their contribution on a target point with controllable accuracy. As a result, 
the matrix-vector products are computed with some error. To ensure convergence of the iterative solver, or achieve a 
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desired accuracy in the solution, we might require the order of the series expansions, p, to be sufficiently large. But 
Krylov methods have the surprising property of only requiring high accuracy on the first iteration, while accuracy 
requirements can be relaxed on later iterations. This property offers the opportunity to use different values of p in the 
FMM as the iterations progress to convergence, reducing the total time to solution. 

This paper presents a relaxation strategy for fast-multipole boundary element methods consisting of decreasing 
the orders of expansion p as Krylov iterations progress in the solver. We tested extensively using the Laplace and 
Stokes equations, and also with an application of Stokes flow around red blood cells. The aim of the study was to 
show that a bem solution with inexact Krylov iterations converges, despite low-accuracy matrix-vector multiplications 
at later iterations, and to find out the speed-ups that can be obtained. We wrote an in-house multi-threaded code in 
that enable experimenting in a variety of scenarios|^ 


2. Methods for the integral solution of elliptic equations using inexact GMRES 


2.7. Boundary-integral solution of the Laplace equation 

To write the Laplace equation, V^(^(x) = 0, in its integral formulation, we use the classical procedure of multiply¬ 
ing by the Green’s function and integrating, applying the divergence theorem of Gauss and the chain rule, then dealing 
with singularities by a limiting process. This results in 
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where G - IjAnr is the free-space Green’s function for the Laplace equation (V^G = -6), ^ represents the partial 
derivative in the direction normal to the boundary surface, and the (Cauchy principal-value) integrals are on the 
boundary T of the domain. (The details of the derivation can be found in Ref. ||4|.) The boundary element method 
(bem) consists of discretizing the boundary into surface panels and enforcing Equation Q on a set of target points 
(collocation version). In a typical bem, surface panels take a constant value fj, and the surface integrals become sums 
over N flat surface elements, Ty, resulting in the following discretized equation: 
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The values of the potential or its normal derivative on each panel are known from boundary conditions, resulting 
in either first-kind or second-kind integral equations. Finding the remaining unknowns requires solving a system of 
linear equations Ax = b, where the elements of the coefficient matrix are 
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and b is formed with the known terms: e.g., if f is given on panel j, then (pj dGijIdhj dTj will appear in the term 
bi on the right-hand side of the linear system. 

Expressing Gij and dGijIdhj in terms of 1/r and hj ■ V(l/r) 
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The next steps are to apply an appropriate numerical integration scheme in order to generate all the terms of the 
coefficient matrix, and subsequently solve the linear system of equations, as described below. 


^The code is available for replication of our results at https://github.com/barbagroup/fmm-bem-relaxed 
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Figure 1: Depending on the distance between a source panel and target point, we use different numerical, analyti¬ 
cal or semi-analytical integration methods, balancing computational efficiency and accuracy near singularities. The 
threshold distance between far and near-singular regions is 2 -\/2S j, where S j is the surface area of the source panel. 


2.2. Boundary-integral solution of the Stokes equation 

The Stokes equation for a flow at very low Reynolds number, /zV^u - Vp (where p is the viscosity and p the 
pressure), can be rewritten in its integral formulation by means of a similar process as that described above for the 
Laplace equation. But it is a vector equation and its fundamental solutions are tensors. The boundary integral form of 
the the Stokes equation is 
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where u is the velocity vector satisfying the Stokes equation (Einstein indicial summation implied), with cr the corre¬ 
sponding stress tensor and i - cr ■ h the traction, vectors x and xq are two distinct points in the domain, and G and T 
are the stokeslet and stresslet fundamental solutions; 
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Indices i, j, k denote here the Cartesian-tensor components and 5,7 is the Kronecker delta. Discretizing the bound¬ 
ary with N surface panels results in sums that we now number with the index J. The discretized form with constant 
surface panels becomes 
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2.3. Numerical, semi-analytical and analytical integration methods 

The boundary integral formulations all demand that we compute integrals of the type Kp dT^, where K/j - 
K(x7 - X,) is the kernel, the point Xj is on the panel surface Tj and the point x, is a target or evaluation point. Because 
the kernel IK is often singular, we need specihc approaches depending on the distance Xj - x,. Where the target point 
is far enough from the surface Ty, simple quadrature with a few Gauss points will suffice. As the target point nears 
the source panel (with the dehnition of “near” to be determined), we need high-accuracy quadrature. Finally, in the 
case where the target point is on the source panel, the integral is (close to) singular and we must use analytical or 
semi-analytical methods. Figure [T] illustrates the three situations. 
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Applying Gauss quadrature to the integrals appearing in the coefficients of the boundary element discretization of 
the Laplace equation, Q and Q, for example, gives 
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with qk the area-normalized Gauss quadrature weights and S j the surface area of panel Lj . To control the accuracy of 
the numerical integration, we vary the number of quadrature points, using for example K - 4 for targets that are far 
from the source panel and K ^ 20 for near-singular situations. The near-singular region is within a distance of 2 yj2.S j, 
a criterion that we settled on after testing with several choices using a panel’s characteristic length scale as factor. 
When the target point is on the source panel, the standard approach is to use analytical or semi-analytical methods 
for the singular and hyper-singular integrals over the panels. For the Laplace equation, we used a semi-analytical 
method. It applies the technique first presented in the classic work of Hess and Smith ||5] p. 49, ff.] for decomposing 
the integral into a sum of integrals over triangles formed by the projection of the target point on the panel plane, 
and the panel vertices. Using polar coordinates, the integration over the radial component can be done analytically 
and the integration over the angular component is done by quadrature. Several analytic integration techniques are at 
our disposal for dealing with the singular integrals from boundary element methods. Explicit expressions for these 
integrals over flat triangular domains result in recursive formulae on the edges of the integration triangle. These are 
available for Laplace potentials Q and linear elastic surface potentials Q. We obtained the analytic integrals for the 
Stokes equation from Fata’s formulas for linear elasticity, after setting the Poisson ratio to 1 /2. 


2.4. Krylov subspace methods 

For large linear systems of equations Ax = b, direct solution is generally impractical and iterative solution methods 
are preferred. Krylov subspace methods derive from the Cayley-Hamilton theorem, which states that you can express 
the inverse of a matrix A as a linear combination of its powers. A Krylov subspace is spanned by the products of b and 
powers of A; to order-r, this is; Kr{A,b) - &pw{b,Ab,A^b, ...,A'' '!?). Krylov methods include the conjugate gradient 
method, the biconjugate gradient stabilized method (bicgstab) and the generalized minimal residual method, gmres 
M , which we use. The greatest cost per iteration is the matrix-vector product (mat-vec), w A- zj, taking O(N^) time 
in a direct implementation. However, given the structure of the coefficient matrix in boundary element methods, this 
operation can be reduced to 0(N) time using, for example, the fast multipole method. The key is understanding the 
matrix-vector product as an A-body problem. Let’s consider the first-kind integral problem for the Laplace equation. 
In we see that the matrix coefficients are Applying Gauss quadrature to obtain the coefficients gives 
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The inner sum is over the Gauss quadrature points (only a few per panel) and the outer sum is over the integration 
panels. We list the pseudocode of the full mat-vec in the Appendix as Algorithm[2 Taking all the quadrature points 
together as the set of “sources” and all the collocation points on the panels as the set of “targets,” the algorithm is 
reduced to two for-loops instead of three, making the analogy with an A-body problem more clear. 


2.5. Fast multipole method 

Fast multipole methods were invented to accelerate the solution of A-body problems, that is, problems seeking 
to determine the motion of A bodies that interact with each other via a long-distance effect (like electrostatics or 
gravitation). A direct approach to such a problem takes 0{N^) time to compute. The first fast algorithms for A-body 
problems ladoi combined two ideas: (1) approximating the effect of groups of distant bodies with a few moments 
(of the charges or masses), and (2) using a hierarchical sub-division of space to determine the acceptable distances 
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to apply these approximations. These ideas produced the treecode algorithm, with <9(A^logA^) time to compute. The 
fast multipole method O introduces a third key idea that leads to 0(N) scaling: allowing groups of distant bodies to 
interact with groups of targets, by means of a mathematical representation called local expansion. 

A typical A-body problem evaluates a potential 0 on / = 1,2, • • • ,N bodies using the following expression 

N N 

(f>i^Y^mj-K(Xi,yj) = Y^Kijtrij, (13) 

1=0 1=0 

where = K(x, , yj) is referred to as the kernel, and the potential is a solution of an elliptic equation, e.g., the Poisson 
equation F, = for gravitation. The expression in ( [T3| ) is analogous to that for one row of the bem mat-vec in 

Equation ( [T^ , taking all Np ■ K Gauss points collectively. The first step in the fmm acceleration of bem is to group 
the Gauss quadrature points (i.e., the “sources”) into clusters, and represent their influence via multipole expansions 
at the cluster centers. If using Taylor expansions, for example, truncated to the first p terms, the potential at a point is 
approximated by 

P , ric 
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where k = {ki,k 2 ,ki} is a multi-index, k! = k\\k 2 \kj,\, y*‘ = ^ ~ *-1*® derivative operator and 

p is the order of the expansion. The right-most terms are the multipoles, i.e., powers of distances with respect to 
the cluster center, and they can all be pre-calculated for a set of sources. Forming the clusters consists of recursively 
sub-dividing the spatial domain until a limit number of sources Acrit remains in each box, resulting in an octree 
structure. The step of computing the multipole expansions for each source cluster at the deepest level of the tree is 
referred to as the particle-to-multipole operation, p2m. These multipole expansions are then translated and added to 
represent larger clusters (at higher levels of the tree) in the multipole-to-multipole operation, m2m. The process just 
described is called the upward sweep. At this point, a treecode algorithm evaluates the potential on target points, 
performing multipole-to-particle operations, m2p —in the bem, the multipoles represent clusters of quadrature points 
and the potential is evaluated on collocation points, as illustrated in Figure As implied in Equation([T4l), this is 
an approximation; and as implied in Figure the approximation is acceptable for remote target points only. The 
parameter that dictates whether we make the approximation is the multipole acceptance criterion, mac, denoted by 
6 and enforced as the maximum allowed ratio between cluster size and distance between targets and cluster center. 
When the mac is not satisfied, sources and targets interact directly via (called particle-to-particle operation, p2p). 
The key to achieving the optimal (9(A) scaling are local expansions, representing a group of target points; thus, the 
EMM adds three operations to the algorithm: the multipole-to-local transformation (m 2 l), the local-to-local translation 
(l2l) and the local-to-particle evaluation (l2p). 

The Cartesian-expansion m 2 l operation scales as (9(p®), which quickly becomes expensive when requiring higher 
accuracy via higher p. Using spherical-harmonic expansions reduces this cost to 0(p‘^), but the mathematical deriva¬ 
tions and expressions become more cumbersome. For the sake of brevity, we omit all the details and write the final 
form for the spherical expansions of the Faplace potential—denoted here by <l>(x,) to differentiate with the spherical 
co-ordinate 0—as follows 
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Here, M'” and L™ denote the multipole and local expansion coefficients; Y”' is the spherical harmonic function; (r, 6, (p) 
and (p,a,/3) are the distance vectors from the center of the expansion to points x, and Xj, and qj are the weights of 
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Figure 2: First step in viewing the influence of boundary elements as an A^-body problem accelerated by the fast 
multipole method; Gauss quadrature points in a region of the surface triangulation are grouped together in a cluster, 
and their contribution to the potential at a remote target point (white circle) is considered via a series expansion at the 
cluster center. 


sources. For our purposes, we just want to make clear the parallel between a fast A^-body algorithm and a fast bem 
matrix-vector product via this necessarily abbreviated summary of the fmm. 

2.6. Inexact Krylov iterations and relaxation strategies 

Accelerating a bem solution with fmm implies computing the matrix-vector products with some error, and the 
parameter controlling this accuracy is the order of multipole expansions, p. Two natural questions to ask are how 
large does the value of p need to be to ensure that the iterative solver will still converge and what’s the impact on 
the accuracy of the converged solution due to the error on the computed mat-vec. We might expect to choose that 
value of p that ensures convergence and desired accuracy, and apply it evenly for all iterations. But it turns out that 
Krylov iterations have a surprising property; Iht first iteration needs to be computed with high accuracy, but accuracy 
requirements can be relaxed for later iterations. Translation operators in the fmm can scale as 0{p^) or Oip'^) —using 
Cartesian or spherical expansions, respectively—, so this property of Krylov methods could offer the potential for 
further accelerating the bem solution. Bouras and Fraysse ifT^ fTSll studied the effect of inexact Krylov iterations on 
convergence and accuracy via numerical experiments. They found that if the system matrix is perturbed, so we are 
computing (A -H AAj^jz on each iteration, and the perturbation stays nearly equal in norm to 77||A||, then the computed 
solution will have an error of the same order, rj. This is the situation when simply computing the mat-vec within the 
limits of machine precision. But they also showed the more surprising result that the magnitude of the perturbations 
AAi can be allowed to grow as the iterations progress. They define a relaxation strategy whereby, for a desired 
final tolerance 77 in the solution of the system, the mat-vec s in each iteration are computed with a coefficient matrix 
perturbed with AA^, where if r^ is the residual at step k, 

IIAA.II = e,||A||, g.^minf . l). (17) 

\min(||rn||, 1 ) j 

The matrix perturbation is always larger than the target tolerance 77 , and s* increases when r^ decreases (without sur¬ 
passing 1). In other words, the accuracy of the system mat-vecs are relaxed as iterations proceed. This relaxation 
strategy led to converged solutions with gmres, cg and bicgstab, using a variety of test matrices—often, and remark¬ 
ably, in about the same number of iterations as a non-relaxed solution. In sum, Krylov methods proved to be robust 
to inexact mat-vecs and only the first Krylov vectors need to be computed accurately. The numerical evidence is also 
supported by theoretical studies Dma. 
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In the fast multipole method, we have error bounds available for the approximations made in the various operations 
that make up the algorithm. Using the spherical-harmonics expansion for the Laplace kernel, for example, the error is 
bounded as follows 

P n 

n=Q m=-n 


^ Yji=l 9i / 

~ r-a [rJ 


(18) 


where a isthe cluster radius and r is the distance between the multipole center and the target. The above inequality 
is given in Ref. HH p. 55] with label (3.38), and proved using the triangle inequality. This reference also gives 
similar bounds for the translation and evaluation operators. In the particular case that rja > 2 (the distance between a 
multipole center and target is at least twice the cluster radius), the number of terms needed to obtain a given accuracy 
e is 


p~r-log2(e)l. (19) 

In combination with Equation ([T^, we thus have an explicit relation between the allowed perturbation on the mat-vec 
for the inexact Krylov iterations to converge, and the order of the multipole expansion in the fmm. Although the fmm 
error bounds are known to be rather loose, we use this approach for a conservative relaxation of p, and we expect that 
with experience and more detailed studies, the relaxation strategy could be pushed to give better speed-ups in specific 
applications. 

3. Results and discussion 

Our results include numerical experiments with bem solutions for the Laplace and Stokes equations, including an 
application to Stokes flow around red blood cells. The bem solver is accelerated with a fast multipole method using 
spherical expansions, implemented in a code designed to use boundary elements (panels) as the sources and targets, 
and written in multi-threaded To facilitate reproducibility of our research, the code is made available via its 
version-control repository]^ under an MIT license. The paper manuscript is also available via its repository]^ which 
includes running scripts and plotting scripts to produce the included figures (see each figure caption for details). We 
ran all our tests on a lab workstation with an Intel Core i7-5930K hexa-core processor and 32GB RAM. Using a 
test with the Laplace kernel, we confirmed that our fmm code scales as 0(N) by timing several runs with increasing 
problem size; see Figure]^ We also found that the preconditioned cases take fewer iterations but more time to converge 
than the unpreconditioned ones, because the residuals after the first several iterations in the preconditioned cases are 
greater. For an inexact gmres iteration, a bigger residual leads to a higher required p, which offsets the benefit of 
using fewer iterations. Therefore, we performed these tests without preconditioning. The subsequent experiments 
investigate the use of a Krylov relaxation strategy by reducing the order of multipole expansions, p, as the iterations 
progress to convergence, determining the speed-up provided by such a strategy for different scenarios and problem 
sizes. 

3.1. Inexact GMRES for the solution of the Laplace equation 

To start, we studied grid convergence comparing numerical results with the analytical solution using a sphere 
with constant potential and charge on the surface: <p - df/dh = 1. To make surface triangulations of a sphere 
with increasing refinement, we started with an 8-triangle closed surface, then split recursively each triangle into four 
smaller ones. Figure ]^ shows two example discretizations. We solved the boundary-element problem by collocation 
in both the first-kind and second-kind integral formulations, using a standard gmres with fast-multipole-accelerated 
mat-vecs and the semi-analytical integrals for the singular terms. For the far-held approximations, we used spherical- 
harmonic expansions with the following parameters in the fmm: 0 mac = 0.5, p = 12; the number of Gauss points was 
k - A- and the tolerance in the iterative solver was 10“^. Figure ]^ shows the resulting convergence for both hrst-kind 
and second-kind formulations of the boundary element method on a sphere. The observed order of convergence is 
0.76 for the Ist-kind formulation and 1.02 for the 2nd-kind formulation, computed with the three points in the middle 


‘https://github.com/barbagroup/fmm-bem-relaxed 
^ https://github.com/barbagroup/inexact-gmres 
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Figure 3: Scaling of the fmm code with respect to problem size N, using a Laplace kernel with spherical expansions, 
p - 5 and A^crit = 126. The tests ran on a lab workstation using six cpu threads. The dotted line shows the expected 
0{N) scaling. Plotting script and figure available under CC-BY IflTl . 



(a) 128 panels (b) 2048 panels 

Figure 4; Triangular discretizations of a spherical surface. 


of each line. We also observed that the finest mesh detracts from the convergence line with p — 10, as the thinner 
line shows in Figure in that case, the discretization error of a simple geometry like a sphere is very small and the 
error of the EMM-accelerated mat-vecs overtakes. This convergence analysis gives confidence in our bem code, the 
singular/near-singular integral calculations, and the far-held approximation using the fmm. 

Next, we looked at the following test to see how the residual changes as the gmres iterations proceed and what 
value of p is required in the FMM-accelerated mat-vecs to continue convergence, according to Equation ( [T9) . We 
discretized a sphere with 32,768 surface triangles and solved a hrst-kind integral equation using a solver tolerance 
of 10 ® with an initial value of p set to 8 and 12. As the residual gets smaller, the value of p needed to maintain 
convergence of the solver drops, and a low-p of just 2 is sufficient by the ninth iteration for pinitiai = 8 and by the 
fourth iteration for pinitiai = 12. This offers the potential for substantial speed-ups in the calculations, because the 
translation operators of the fmm scale from Oip'^) for spherical harmonics to 0{p^) for Cartesian expansions. A more 
accurate hrst iteration results in a faster drop of required-p and fewer iterations. But we note that only the far-held 
evaluation can be sped-up with the relaxation strategy, which means that the correct balance between near held and 
far held in the fmm could change as we reset p in the later iterations. 

To hnd out the potential speed-up, we compared the time to solution for different cases with and without the 
relaxation strategy. Using three surface discretizations, we solved the boundary-element problem with 1st- and 2nd- 
kind formulations, with a multi-threaded evaluator on 6 cpu cores. In each case, we were careful to set the value 
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Figure 5: Convergence of Ist-kind (solid line) and 2nd-kind (dotted line) solvers for the Laplace equation on a sphere, 
using a gmres with FMM-accelerated matrix-vector products, with parameters: 0 mac = 0.5, p - 12, k - A and solver 
tolerance of 10“^ (no relaxation). For the Ist-kind formulation, we also used p = 10 (thinner solid line) to show a 
degradation of grid-convergence with lower p. The vertical axis is the L^-norm of the relative error with respect to 
the analytical solution for a constant potential or charge on the surface: (ft - dipidh - 1. Plotting script and figure 
available under CC-BY ifTSl . 



of A^crit to minimize the time to solution of the particular test case. The detailed results are given in Tables and 
1^ Figure 1^ shows the speed-up in the time spent solving the linear system of equations to the specified tolerance. 
As indicated in the caption, we used an initial value of p = 10 and a solver tolerance of 10“^. We also repeated 
the experiments with less stringent accuracy settings of p = 8, and solver tolerance 10“^. In that case, the speedups 
are reduced (maximum observed speedup of 2.22 for the Ist-kind formulation), but we decided to show the higher 
accuracy tests, given the observed degradation of grid-convergence with lower p, as seen in Figure]^ 

The results on Tablesandshow a speed-up of between 2.lx and 3.3x for the Ist-kind integral formulation 
and between l.lx and 1.35x for the 2nd-kind formulation. The rightmost column shows the number of iterations to 
converge: we need 3 or less iterations for the 2nd-kind formulation, which explains why we have such low speedup 
compared with the Ist-kind formulation. These tests also taught us that one has to give up on the idea of partitioning 
the domain between a near-held and a far-held in a way that balances the time spent computing each one—an accepted 
idea in fmm applications. When relaxing the accuracy of the gmres iterations, the time taken to compute the far held 
decreases signihcantly. This means that to minimize time-to-solution when using relaxed gmres, the near and far helds 



Iterations 



Figure 6: In a test using a sphere discretized with 32,768 triangles, with k = 4, pinitiai = 8 (left plot) and pinitiai = 12 
(right plot), the residual ||r*|| (solid line, left axis) decreases with successive gmres iterations while the necessary p 
(open circles, right axis) to achieve convergence drops quickly. Plotting script and hgure available under CC-BY ifTSl . 
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Figure 7: Speed-up using a relaxation strategy for three different triangulations of a sphere (N is the number of 
surface panels), using Ist-kind and 2nd-kind integral formulations. Initial p = 10, k = 4, solver tolerance 10“^. Time 
is measured by averaging the solving time of three identical runs. (Multi-threaded evaluator running on 6 cpu cores.) 
Plotting script and figure available under CC-BY IfTSl . 


N 

Non-Relaxed 

A^CRIT 4olve 

Relaxed 

M:RIT 4olve 

Speed-up 

Number 
of iterations 

8192 

400 

4.93 

175 

2.32 

2.13 

11 

32768 

400 

20.32 

200 

9.45 

2.15 

13 

131072 

400 

137.84 

200 

41.32 

3.34 

22 


Table 1: Speed-ups for the relaxation strategy on a Laplace Ist-kind integral solver, p — 10, k = 4, solver tolerance 

lo-^ 

should not be balanced, but rather the far field should be bloated. As a result, the first few iterations are completely 
dominated by the time to compute the far field, but this is offset by the benefit of much cheaper iterations from then 
on. This is a simple but unexpected and counter-intuitive algorithmic consequence of using inexact gmres with fmm. 

To better evaluate the potential speedup in more general cases, we designed tests in the following two situations; 
(a) where higher accuracy of FMM-accelerated mat-vec product is needed (necessitating an initially higher p) , and (b) 
where the linear system demands a greater number of iterations to reach a smaller desired tolerance. 

Figurej^shows the timings obtained in a situation where the initial p is incrementally larger, representing applica¬ 
tions that demand higher accuracy. To minimize the effect of discretization error on total accuracy, we discretized the 
sphere with 32,768 panels. We also enforced a hxed number of 10 gmrfs iterations on this Ist-kind Laplace integral 
solver. Table 1^ shows the data for this test: the speed-up is leveling off at around 2x with a varying initial p. Note 
again that the shortest time-to-solution requires an unbalanced tree on the first iteration, with a bloated far field. This 
means that the first (high-p) iteration is much slower than the corresponding fixed-p case; the speed-up is thus purely 
a product of the low-cost later iterations. 

The final case looks at the situation where the application might demand different tolerances. Keeping the value 
of initial p fixed at 10, we ran several cases with 8,192 panels and increasingly stricter tolerance. Figure|^and Table 
l^show that the relaxed cases experience a speed-up between L5x and 2.Ox for different desired tolerances and the 
speed-up peaks at a tolerance of 10'®. 

3.2. Inexact GMRES for solving the Stokes equation 

Like in we start with a grid-convergence study to build confidence that the Stokes solver is correct and 
converges to the right solution at the expected rate. As an application of the Stokes equation, we chose low-Reynolds- 
number flow, using a spherical geometry for the grid-convergence study. This classical problem of fluid mechanics 
has an analytical solution that gives the drag force on the sphere as Fd - 6npRux, where p is the viscosity of the fluid, 
R is the Reynolds number and is the freestream velocity, taken in the x-direction. We solve a first-kind integral 
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Figure 8; Timings for solving a Ist-kind Laplace integral formulation on a sphere discretized with 32,768 panels, 
using a relaxed gmres with different initial values of p, compared with a fixed-p solver. The iteration count was 
capped at 10 for all cases. Time is measured by averaging the solving time of three identical runs. (Multi-threaded 
evaluator running on 6 cpu cores.) Plotting script and figure available under CC-BY ESI. 



Figure 9: Speed-ups for solving a Ist-kind Laplace integral problem on a sphere discretized with 8,192 panels, as 
the GMRES solver’s tolerance increases; p = 10 for all cases. Time is measured by averaging the solving time of three 
identical runs. (Multi-threaded evaluator running on 6 cpu cores.) Plotting script and hgure available under CC-BY 

El- 
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N 

Non-Relaxed 

A^CRIT ^solve 

Relaxed 

A^CRIT ^solve 

Speed-up 

Number 

of iterations 

8192 

400 

2.10 

300 

1.55 

1.35 

3 

32768 

400 

7.45 

200 

5.76 

1.29 

2 

131072 

400 

23.63 

200 

21.32 

1.11 

2 


Table 2: Speed-ups for the relaxation strategy on a Laplace 2nd-kind integral solver, p - 10, A: = 4, solver tolerance 

lo-^ 


P 

Relaxed 

A^CRIT ^solve 

Non-Relaxed 

A^CRIT ^solve 

Speedup 

5 

100 

3.94 

100 

7.53 

1.91 

8 

100 

6.20 

400 

12.38 

2.00 

10 

150 

9.08 

400 

19.96 

2.20 

12 

150 

12.70 

600 

25.41 

2.00 

15 

150 

22.05 

600 

37.77 

1.71 


Table 3: Speed-up when using a relaxation strategy on a Laplace Ist-kind integral solver, compared with a non-relaxed 
solver, with increasing value of the initial p (representing increased accuracy demands of the application), for a sphere 
discretized with 32,768 panels. (Multi-threaded evaluator running on 6 cpu cores.) 


equation for the traction force, t, by imposing u = (1,0,0)^ at the center of every panel, and compute the drag force 
with 



For all the tests, we set R - \ and p — 10“^, giving a drag force of Fd - 0.01885. We solve the integral 

problem using a boundary element method with fast-multipole-accelerated mat-vecs in a gmres solver. Based on the 
inexact Krylov method Gsiini, we choose an initial p of 16 to compute the hrst Krylov vectors with full accuracy, 
and use a 10“^ tolerance in the iterative solver. Figure shows that we observe convergence at the expected rate of 
(9(1/ ViV), for hrst-kind integral equations. 

Like with the Laplace equation, we need to show that the relaxation strategy does not hinder convergence and 
that there is a potential for speed-ups. We solved the problem of Stokes flow around a sphere, discretized with 8,192 
panels, and compared the residual history of a flxed-p solver with a relaxed gmres with an initial p - 16. Figure [TT] 
shows that the residual history (for the traction force) is similar for both relaxed and non-relaxed Krylov iterations, 
both methods reaching the stipulated tolerance of 10“^ after about 27 iterations. The number of iterations needed 
to converge is larger in the case of the Stokes equation compared to the Laplace equation, which bodes well for the 
speed-up that we could get from relaxation. Moreover, computing the spherical expansion for the Stokes kernel is 
equivalent to computing four Laplace expansions, which combines with the larger number of iterations to offer greater 
speed-ups. 

Figure [T^ illustrates clearly how we need to adjust the balance between near held and far held when using relax¬ 
ation strategies. Because most of the time is spent computing at the low values of p, we need to start with a bloated 
far held. The bar plot shows the breakdown of time spent in the p 2 p and m 2 l kernels for each iteration: although the 
first iteration is unbalanced, with far held taking about 8 times as much cpu time as near held, later iterations are close 
to balanced and the total time to solution is optimal. We ran extensive tests on the minimum value of p allowed in the 
relaxed solver, without degrading convergence and accuracy. Table [^presents some of the data from these tests: for 
finer surface discretizations, the error degrades when the relaxed value of p is allowed to drop below 3 or 4. To be 
conservative and avoid any degradation in accuracy, we used a p^i^ - 5 for all further cases with the Stokes equation. 


'when is set to 1, solution does not converge within 100 iterations for the two finer meshes. 
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Figure 10: Convergence of the boundary-integral solution for Stokes flow around a sphere, using a hrst-kind equation; 
p - 16, linear system solved to 10“^ tolerance. The relative error is with respect to the analytical solution for drag on 
a sphere: Fd - bnpRu^. Plotting script and figure available under CC-BY lfT9ll . 



Figure 11: Residual history solving for surface traction on the surface of a sphere (first-kind integral problem), with 
a 10“^ solver tolerance, 8192 panels, and initial p = 16 in the relaxed case (and throughout in the non-relaxed case). 
Plotting script and hgure available under CC-BY llT9l . 
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tolerance 

Relaxed 

A^CRIT 4olve 

Non-Relaxed 

A^CRIT 4olve 

Speedup 

10-5 

400 

1.65 

400 

3.13 

1.90 

10-'' 

300 

2.34 

400 

4.71 

2.01 

10-'' 

300 

4.07 

400 

7.93 

1.95 

10-** 

300 

5.77 

400 

9.87 

1.71 

10-9 

300 

8.14 

400 

12.51 

1.54 

10-*° 

300 

10.11 

400 

16.29 

1.61 

10-*' 

300 

12.24 

400 

17.88 

1.46 


Table 4; Speed-up of the relaxation strategy on solving a Laplace Ist-kind integral problem on a sphere discretized 
with 8,192 panels, with a varying solver’s tolerance; initial p fixed to a value of 10. (Multi-threaded evaluator running 
on 6 CPU cores.) 



Iteration 


Figure 12: Time breakdown between p 2 p and m 2 l when using a relaxation strategy for solving surface traction on the 
surface of a sphere, 10“^ solver tolerance, 8,192 panels, p - 16. Plotting script and hgure available under CC-BY 

HU. 


Figure 13 shows the speed-up resulting from the relaxed gmres iterations for three increasingly hner surface 
discretizations. For N - 32,768, speed-up is more than 3.Ox. We also observed a speedup between 1.7x and 3.5x 
under different gmres solver tolerances for N - 192, as shown in Figure [T4| 


3.3. Application to red blood cells in Stokes flow 

A number of medical applications will beneht from greater understanding of the microflows around red blood cells 
and of the mechanical effects on the cells from this flow. The most notable example is perhaps the deadly malaria 
infection, which makes red blood cells stiffer thus disrupting the flow of blood in capillaries EOll . Any design of a 
biomedical device that processes blood at the micrometer-scale needs to consider the mechanical behavior of blood 
at the cellular level ||2TI| . Blood is a dense suspension of mostly red blood cells and smaller concentrations of white 
blood cells and platelets. The flow regime in small capillaries is at very low Reynolds numbers, and thus completely 
dominated by viscous effects. Red blood cells are very flexible, so any physiologically realistic simulation should 
take into account their elastic deformations. But here we are only attempting to show the benefit of our relaxation 
strategy on the Stokes solver, and thus limit our study to the steady Stokes flow around a red-blood-cell geometry. The 
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Figure 13: Speed-up for solving first-kind Stokes equation on the surface of a sphere, varying 10 ^ solver tolerance, 
p - 16. Time is measured by averaging the solving time of three identical runs. Plotting script and figure available 
under CC-BYlfT9l. 



tolerance 


Figure 14: Speed-ups for solving a Ist-kind Stokes problem on a sphere discretized with 8,192 panels, as the gmres 
solver’s tolerance increases; p - \6 for all cases. Time is measured by averaging the solving time of three identical 
runs. (Multi-threaded evaluator running on 6 cpu cores.) Plotting script and figure available under CC-BY lfT9ll . 
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2048 panels 


8192 panels 

32768 panels 

Pmin 

Error 

it 

Error 

it 

Error 

it 

5 

4.61 X 

24 

2.48 X 10-2 

29 

1.34 X 10-2 

28 

4 

4.56 X 10^2 

25 

2.50 X 10-2 

29 

1.17 X 10-2 

28 

3 

4.58 X 10^2 

23 

3.57 X 10-2 

30 

1.74 X 10-2 

30 

2 

4.41 X 

25 

1.45 X 10-‘ 

39 

2.04 X 10-2 

39 

1 

4.35 X 10-2 

28 

3.42 X 10-2 

100* 

3.03 X 10-2 

0 

0 


Table 5: Effect of pmin on accuracy and convergence for Stokes flow around a sphere for differing values of N. Error 
is on the total drag force in the ;c-direction, F^- 



(a) 512 panels (b) 2048 panels 


Figure 15; Surface geometries of red blood cells obtained from transforming sphere triangulations using Equation 

•HD- 


unsteady problem of coupled Stokes flow and linear elasticity can be approached by repeated solution of boundary- 
integral problems at every time step, and would equally benefit from the speed-ups seen on a single Stokes solution. 

To create a surface discretization for a red blood cell, we start with a sphere discretized into triangular panels and 
transform every vertex v - v{x,y,z), with x,y,z 6 [- 1 , 1 ], into v' - v'(x',y',z(p')) using the formula presented in 
Ref. Ga: 

z(p) = +i^l-(^) (co + C 2 (^) +C4(^)j, ( 21 ) 

where x' - x ■ r, y' - y ■ r, p - yjx'^ + y'^, and the coefficients are: r = 3.91//m, Co = O.Slyum, C 2 = 7.83jt/m 
and C 4 = -4.39/im. Figure shows two examples of transformed shapes obtained from sphere triangulations using 
Equation ( |2T| l. 

A grid-convergence study using the geometry of a red blood cell requires that we use Richardson extrapolation 
||23]| . since we don’t have an analytical solution for this situation. We calculated the drag on a red blood cell in uniform 
Stokes flow using three surface meshes, consecutively refined by a constant factor c - 4. If the value /i corresponds 
to that obtained using the coarsest mesh and /2 and to those using consecutively refined ones, then we can obtain 
the extrapolated value approximating the exact solution with the following formula; 


f- 


/1/3-/2" 


( 22 ) 


/1-2/ 2+/3 

Table [^presents the computed values of the drag force obtained with three different meshes, of sizes N = 512, 
2048 and 8192. We can also obtain the observed order of convergence, p, as follows 


P = 



(23) 


where c is the refinement ratio between two consecutive meshes. With the values in Tablethe observed order of 
convergence comes out at 0.52, matching our expected rate of convergence of 0( V^)- Figure [^shows a plot of the 
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Figure 16: Observed convergence for Stokes flow around red blood cells, with respect to the extrapolated value of the 
drag coefficient, using Richardson extrapolation Il2^ . Plotting script and figure available under CC-BY Il24ll . 


error with respect to the extrapolated value, as a function of the mesh size. The plot includes the error obtained with 
four meshes, with the extrapolated value obtained using the first three coarser meshes. 


N I 


512 

-0.057 

2048 

-0.070 

8192 1 

-0.077 


Table 6: Surface mesh sizes and calculated drag force for the convergence study using a red blood cell in uniform 
Stokes flow. 


The calculations with increasingly finer surface meshes take more time to complete not only because the number 
of unknowns is larger, but also because they may require a greater number of iterations to converge to a desired 
residual. Figure[T7|shows that the number of iterations needed as the surface mesh varies from size N - 128 to 8,192 
increases from 18 to 39. For further refined meshes, the number of iterations remains about the same. This is an 
indication that the surface mesh is sufficiently refined with 8,192 panels for one red blood cell. 

For the next tests, we used several red blood cells in a sparse spatial arrangement. Realistic blood flows have 
densely packed red blood cells, but the purpose of this test is to simply demonstrate the boundary element solver with 
a larger problem. We set up a collection of red blood cells by making copies of a discretized cell, then randomly 
rotating each one, and shifting it spatially in each coordinate direction by a positive random amount that ensures they 
do not overlap. The resulting arrangement may look like that shown in Figure [TS] 

Using 2, 4 and 8 red blood cells, we looked at the number of iterations needed to converge to a solver tolerance 
of 10“^ when using three different surface mesh sizes on each cell: N = 2048, 8192 and 32,768. In all cases 
p - 16, pmin = 5. Figure [T^ shows that the number of iterations needed to converge increases sharply with the 
number of red blood cells in the system, while the number of panels per cell has a smaller effect in this range of mesh 
sizes. In all cases, the number of iterations is between 45 and 65, and thus we expect to see good speed-ups using the 
relaxation strategy. 

We completed several tests of the relaxation strategy, using between 1 and 64 red blood cells, and various mesh 
sizes on each cell. The common parameter settings for these tests are listed in Table[7]and, in each case, we chose the 
value of A^crit (establishing the balance between near and far field) to obtain the smallest time to solution for that run. 
The detailed results are listed in Tables 10 and 11 and Figure [20| shows a summary of the observed speed-ups, 
mostly hovering close to 4x. The largest problems, with a total of 131,072 panels (all cells combined), are atypical 
because in these cases we are unable to use an efficient sparse-matrix representation of the near field, due to the large 
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Figure 17; Number of iterations needed for the system to converge for increasingly refined surface meshes on one red 
blood cell; p - 16, target residual 10“^. Plotting script and figure available under CC-BY Il24l . 

.C* 

Figure 18; Surfaces for four red blood cells in a uniform Stokes flow. 


memory requirement. But this can also be seen as an advantage of the relaxation strategy, which leads to using smaller 
near fields and thus extends the range of problem sizes where we can use the efficient sparse-matrix representation. 
Indeed, if one needed to solve a problem of size N - 131,072 (on one cpu using six threads, like we do here), then 
the potential for a 7x speed-up is real. 

3.4. Reproducibility 

Our inexact gmres code is open-source and available on Github. Knowing the increasing importance of com¬ 
putational reproducibility, we provided the ‘"reproducibility package” — containing the running and post-processing 
scripts — to generate the figures in this result section. To save readers from potential headaches of dependency mis¬ 
matches, we also prepared a Dockerfile to setup up the software environment (a Docker container), under which the 
“reproducibility package” can be used to replicate our results. 


*Due to memory restrictions, a sparse-matrix representation of the near-held could not be used, resulting in a much slower p2p evaluation. 
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Figure 19: Number of iterations needed to converge to a desired residual of 10 ^ for systems with multiple red blood 
cells, discretized with different mesh sizes {p - 16). Plotting script and figure available under CC-BY Il24l . 


Variable 

Setting 

Pinitial 

16 

Pmin 

5 

solver tolerance 

10-^ 

Near-field 

Sparse matrix 

Threads 

6 

Solver 

GMRES 

Preconditioner 

None 


Table 7: Parameters for the tests of the relaxation strategy with red blood cells in uniform Stokes flow. 


4. Conclusion 

We have shown the first successful application of a relaxation strategy for fast-multipole-accelerated boundary 
element methods, based on the theory of inexact gmres. Testing the relaxation strategy on Laplace problems, we 
confirmed that it converges to the right solution, it provides moderate speed-ups over using a fixed p, and it leads to 
initially bloated far-fields to obtain the minimum time to solution. Exploring the performance advantage of relaxing 
the value of p as gmres iterations advance, we concluded that problems requiring high accuracy and/or resulting in 
more ill-conditioned linear systems will experience the best speed-ups, which for Laplace problems were between 
2.lx and 3.3x in our tests on a sphere with constant potential. 

In the case of the Stokes equation, the speed-ups that can be obtained using a relaxation strategy are larger, due 
to the fact that Stokes problems require both more iterations to converge (and the relaxed solver spends more time 
at low p) and more work per iteration (equivalent to four Laplace evaluations). We found that it’s important for 
Stokes problems to also enforce a minimum value of p to avoid accuracy or convergence degradation. Relaxed gmres 
iterations in this case reduced the time to solution by up to 3.0x in tests of Stokes flow around a sphere. We completed 
various tests for Stokes flow around red blood cells, with up to 64 cells. We studied numerical convergence in this 
situation using Richardson extrapolation and obtained an observed order of convergence of 0.52, close to the expected 
value of 1/2. The speed-ups resulting from the relaxation strategy in these tests were in most cases close to 4x. 
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Non-relaxed 

Relaxed 


N 

# unknowns 

A^crit 

^solve 

A'crit 

4olve 

Speedup 

2048 

6144 

400 

48.18 

150 

14.61 

3.30 

8192 

24576 

400 

234.44 

100 

69.03 

3.40 

32768 

98304 

400 

978.80 

100 

276.52 

3.54 

131072 

393216 

200 

5122.43* 

100 

1044.77 

4.90* 


Table 8; Timings and speed-up of the relaxation strategy on a single red blood cell in uniform Stokes flow, with the 
test parameters shown in Table 


2048 panels / cell 


N 

# unknowns 

Nc 

Non-relaxed 

A^CRIT ^solve 

Relaxed 

A^CRIT 4olve 

Speedup 

2048 

6144 

1 

400 

48.18 

150 

14.61 

3.30 

8192 

24576 

4 

400 

267.64 

100 

87.20 

3.07 

32768 

98304 

16 

400 

1358.07 

100 

444.55 

3.05 

131072 

393216 

64 

200 

7077.77* 

100 

1481.47 

4.78* 


Table 9; Timings and speed-up of the relaxation strategy with several red blood cells in uniform Stokes flow, each cell 
discretized with 2048 panels and test parameters shown in Table 


8192 panels / cell 


N 

# unknowns 

Nc 

Non-relaxed 

A^CRIT 4olve 

Relaxed 

A^CRIT 4olve 

speedup 

8192 

24576 

1 

400 

234.44 

100 

69.03 

3.40 

32768 

98304 

4 

400 

1494.37 

100 

371.06 

4.03 

131072 

393216 

16 

150 

12817.01* 

100 

1800.53 

7.12* 


Table 10: Timings and speed-up of the relaxation strategy with several red blood cells in uniform Stokes flow, each 
cell discretized with 8,192 panels and test parameters shown in Table 


32768 panels / cell 





Non-relaxed 

Relaxed 


N 

# unknowns 

Nc 

A'crit 

4olve 

AIcRIT 

^solve 

speedup 

32768 

98304 

1 

400 

978.80 

100 

276.52 

3.54 

131072 

393216 

4 

200 

7209.57* 

100 

1397.27 

5.16* 


Table 11: Timings and speed-up of the relaxation strategy with several red blood cells in uniform Stokes flow, each 
cell discretized with 32,768 panels and test parameters shown in TablejT] 
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Figure 20: Speed-ups of the relaxation strategy with several red blood cells in uniform Stokes flow, using different 
mesh sizes on each cell. The abscissa value corresponds to the total number of panels (all cells). Test parameters 
shown in Table Time is measured by averaging the solving time of three identical runs. 


Relaxing the truncation order p in the multipole expansions as Krylov iterations progress is one of those seemingly 
simple ideas that strike one as obvious a posteriori. Yet, as far as we know, it has not been tried before, nor has it 
been implemented in a bem. Given this method’s wide popularity in computational engineering, we look forward to 
many applications benefitting from healthy speed-ups from applying relaxed-p fmm. We showed that Stokes problems, 
in particular, can expect 4x speed-ups in large problems still fitting on one workstation. Linear elasticity problems 
should experience similar speed-ups (although we didn’t try them). This is pure algorithmic speed-up that should 
multiply with any hardware speed-ups obtained, for example, by moving computational kernels to gpus. 

Appendix A. Algorithm listings 


Algorithm 1 Matrix-vector multiplication. 
Initialize w 

for Collocation points / = 1 • • • do 

Wi 0 

for Integration panels j = 1 • • • do 

for Gauss quadrature points k = I ■ ■ ■ do 
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